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(57) Abstract: A next-generation voice processing system (NGVPS) is provided. Voice-processing blocks within prior art system 
have been opened up revealing common functions and interblock dependencies. By opening up and consolidating portions of these 
blocks, the NGVPS enhances the functionality of some functions by using processing and signals that were, previously only available 
to a single block. By taking into account the interaction of these various sub-systems and elements, the NGVPS provides the best 
overall voice performance. This holistic approach provides new implementations for optimizing voice processing from an end-to-end 
systems approach. 
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INTEGRATED VOICE PROCESSING SYSTEM FOR PACKET NETWORKS 

Technical Field 

The present invention is principally related to voice processing systems and, in 
5 particular, to a next generation voice processing system (NGVPS) designed specifically for 
voice-ovcr-x systems and a wider class of voice processing applications. 

Cross-Reference To Related Applications 

The present application claims priority from U.S. Patent Application Serial No. 
,0 60/163.350 entitled "INTEGRATED VOICE PROCESSING SYSTEM FOR 
COMMUNICATION NETWORKS" filed on November 3, 1999 and of U.S. Patent 
Application Serial No. 60/224,398 "NOISE INJECTING SYSTEM" filed on August 10, 
2000 both assigned to the same assignee of the present invention. 

The teachings of U.S. Patent Nos. 5,721,730; 5,884,255; 5,561,668; 5,857,167 and 
15 5,9 1 2,966 are hereby incorporated by reference. 

Background Of The Invention 

Voice quality is critical to the success of voice-over-x (e.g., Voice-Over-IP) systems, 
which has led to complex, digital signal processor (DSP) intensive, voice processmg 
20 solutions. For the so-called new public network to be successful in large-scale votce 
deployment, it must meet or exceed the voice quality standards set by today's time diviston 
multiplex (TDM) network. These systems require a combination of virtually all known 
single source voice processing algorithms, which include but are not limited to the followtng: 
echo cancellation, adaptive level control, noise reduction, voice encoders and decoders (or 
25 codecs), acoustic coupling elimination and non-linear processing, voice activity detectors, 
double talk detection, signaling detection-relay-and-regeneration, silence suppression, 
discontinuous transmission, comfort noise generation and noise substitution, lost packet 
substitution/reconstruction, and buffer and jitter control. The current generaUon of votce 
solutions for packet networks has addressed this complex need by obtaining and plugging 
together separate voice sub-systems. Suppliers of these systems have concentrated then 
efforts in obtaining and creating each of the various blocks and making the blocks work 
together from an input-output perspective. During the integration process each of the 
functions have effectively been treated as black boxes. As a result, the sub-systems have 
been optimized only with regard to their function and not with respect to the complete 
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system. This has lead to an overall sub-optimal design. The resulting systems have a 
reduced voice quality and require more processing power than an integrated approach, which 
has been optimized from a system perspective. 

FIG. 1 shows a typical "black box" block diagram. The following abbreviations are 
5 used in FIG. 1: NR: noise reduction; ALC: automatic level control; ENC: speech encoder; 
FE: far end speaker; EC: echo canceller; SS: silence suppressor; NS: noise substitution; DEC: 
speech decoder; and NE: near end speaker. As shown, a transmitted voice signal 102 is 
processed by the echo canceller, and the pulse code modulated (PCM) output of the canceller 
is simply forwarded to the optional noise reduction unit, and then onto the auto level control 
1 0 unit, and then onto the codec, etc. A similar path is provided for received voice signals 1 04. 

The problem with this method of simply plugging together DSP boxes is that it does 
not take into account the interactions of the elements within the boxes. FIG. 2 shows some of 
the individual elements within the subsystems in the voice-over-x DSP system of FIG. 1. A 
feel for the problem can be attained by some examples; a couple of the subsystem elements 
1 5 that can lead to sub-optimal voice quality are examined here. 

In typical fashion, a non-linear processor (NLP) is included within the echo 
cancellation block. The NLP is a post-processor that eliminates the small amount of residual 
echo that is always present after the linear subtraction of the echo estimate. One artifact of 
the NLP is that it can distort background noise signals. Also shown in FIG. 2 are some of the 
20 components inside the noise reduction (NR) block. The NR sub-system must generate a 
background noise estimate. If the NR block is not aware of the distortion introduced by the 
NLP, it will improperly identify the background noise resulting in lower performance. As 
also known in the art, there is a background noise estimate function within the speech coder 
subsystem. This estimate is sent to the far end voice-over-x system when the near end 
25 speaker is silent. Both the NLP and the NR block would also adversely affect this noise 
estimate if their actions were not taken into account. 

Another interaction problem can occur with the voice activity detectors (VAD) shown 
in FIG. 2. The goal of the VAD is to accurately detect the presence of either NE or FE 
speech. If speech is present, then the associated processing of the ALC, NR, or speech coder 
30 is performed. The echo canceller's double talk detector (DTD) is another form of VAD. It 
must detect both NE and FE speech and control the canceller so that it only adapts when NE 
speech is absent. Interaction between the elements such as the NLP, NR, or changes in the 
ALC can negatively affect the accuracy of the downstream VAD. For example, losses in the 
NLP or NR subsystems may falsely trigger the speech encoder to misinterpret voice as 
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silence. This would cause the codec to clip the NE speech, which would degrade voice 
quality Similar issues exist with regard to the VAD in the ALC block. 

Thus, a need exists for an improved voice processing system that does not suffer from 
the interactive shortcomings of prior art solutions. 

5 

Summary Of The Invention 

The present invention provides a next-generation voice processing system (NGVPS) 
designed with the overall system in mind. Each voice-processing block has been opened up 
revealing common functions and inter-block dependencies. By opening up these blocks, the 

10 NGVPS also enhances the functionality of some functions by using processing and signals 
that were previously only available to a single block. By taking into account the interaction of 
these various sub-systems and elements, the NGVPS provides the best overall voice 
performance. This holistic approach provides new means for optimizing voice processing 
from an end-to-end systems approach. This will be an important factor in the success of the 

15 new network. 

A more system-wide optimization approach is described herein. This approach takes 
into account the interaction of the various sub-systems and elements to provide the best 
overall voice performance. For the so-called new public network to be successful in large- 
scale voice deployment, it must meet and should exceed the voice quality standards set by 
20 today's TDM network. Therefore, optimizing voice processing from an end-to-end systems 
approach is a critical success factor in new network design. 

The system-wide, integrated voice processing approach of the present invention also 
creates opportunities for further enhancements by reordering of the sub-blocks, which make 
up the various blocks. For example, work has been conducted in the past on sub-band NLPs 
25 for echo cancellers. However, the significant processing required to create the sub-bands has 
typically been an over-riding factor with respect to the performance improvements. 
However, a NR system typically divides the signal into sub-bands in order to perform its 
operations. Opening up these blocks facilitates a system in which the EC's NLP can be 
moved to the sub-band part of the NR system. Thus, the performance improvement may be 
30 gained with very little additional processing. 

The new public network concept, which is based on packet voice, requires this type of 
processing at each point of entry and departure from the network. EstabUshing a more 
integrated system, having the best performing processing elements at these points, is one of 
the objectives of the present invention. The present invention may be applicable to voice 
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band enhancement products or voice-over-x products. Additional applications that could 
benefit from the present invention include any other products carrying-out voice processing. 

Brief Description Of The Drawings 

5 In the detailed description of presentiy preferred embodiments of the present 

invention which follows, reference will be made to the drawings comprised of the following 
figures, wherein like reference numerals refer to like elements in the various views and 
wherein: 

FIG. 1 is block diagram of a voice processing system in accordance with prior art 
10 techniques; 

FIG. 2 is a block diagram illustrating various blocks of the voice processing system of 

FIG. 1 in greater detail; 

FIG. 3 is a block diagram of a voice processing system in accordance with the present 

invention; 

15 FIG. 4 is a block diagram of an echo canceller and noise reduction circuit in 

accordance with prior art techniques and to which the present invention may be beneficially 
applied; 

FIG. 5 is a block diagram of a noise injection system in accordance with one 
embodiment of the present invention; and 
20 FIG. 6 is a block diagram of a duo echo canceller system in accordance with another 

embodiment of the present invention. 

Detailed Description of the Invention 
1. An Integrated Approach 

25 Higher levels of voice quality can be achieved if the interactions of the elements 

within the boxes are considered and an integrated design approach is taken. The NGVPS 
system in effect opens these blocks, combining and enhancing common functions. This 
approach also eliminates inter-block dependencies. As a result of taking into account the 
interaction of these various sub-systems and elements, the NGVPS provides improved voice 

30 performance with less processing. In addition to improving common functions, the NGVPS 
enhances overall functionality by using processing and signals that were previously only 
available within a single block for multiple functions. 

2. A Consolidated Multifunction Voice Activity Detector 
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A block diagram of an integrated voice-over-x DSP system in accordance with the 
present invention is shown in FIG. 3. As those having ordinary skill in the art will recognize 
various features of the system can be implemented in hardware, software, or a combmatum of 
hardware and software. For example, some aspects of the system can be implemented m 
5 computer programs executing on programmable computers. Each program can be 
implemented in a high level procedural or object-oriented programming language to 
communicate with a computer system. Furthermore, each such computer program can be 
stored on a storage medium, such as read-only-memory (ROM) readable by a general or 
special purpose programmable computer, for configuring and operating the computer when 
10 the storage medium is read by the computer to perform the functions described above. Note 
that there are a variety of signal types illustrated in FIG. 3. Speech signals (preferably » 
digital form) are represented by heavy solid lines; signal estimates, representative of vanous 
qualities of the voice signals, are illustrated using dashed lines; control signals are illustrated 
using solid Hues; and algorithmic parameters, representative of internal values calculated by 
1 5 the various voice processing blocks, are illustrated using heavy dashed lines. 

Transmitted voice signals 102 are provided to an echo canceller having an adder 304 
and echo estimator 306. The resulting signals are then passed to a noise reduction cncmt 308 
and a non-linear processor 310. Collectively, the echo canceller, noise reduction chert 308 
and NLP 310 form an integrated echo and noise reduction section. The output of the NLP 
20 310 is sent to an ALC 312 and then to buffering 314 and a speech encoder 316. It should be 
noted that a centralized buffer (not shown) is preferred over separate buffers associated wuh 
particular voice processing blocks (e.g., the buffering 3 14 associated with the speech encoder 
316) in this manner, the various voice processing operations may be sequentially performed 
on audio data stored in the buffer. However, the centralized buffer has not been illustrated » 
25 FIG 3 for the sake of simplicity. Similarly, the echo canceller functionality and the speech 
encoder 316 are preferably integrated, although they are shown as being separate in FIG. 3. 
The elements described above collectively form a transmit signal processing section of the 
overall integrated system, as shown in FIG. 3. Note that the term "circuitry" and Us 
derivatives are used throughout this description as a means of describing various funcUonal 
elements shown in the figure, However, use of this term should not be construed as a 
limitation to the manner in which such elements may be implemented, i.e., as hardware 

The various blocks within the control processing section of the integrated system 
receive inputs from and provide outputs to the various blocks in the transmit signal 
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circuits. 
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processing section. Such signals are well known to those having ordinary skill in the art and, 
where necessary, are discussed below. Within the control processing section, a centralized 
voice activity detector 330 and a centralized noise estimator 332 are provided. As shown, 
these blocks are coupled to a residual estimator 334 (for assessing the amount of residual 

5 echo left in the transmit signal 1 02 after echo cancellation), a near end signal estimator 336, a 
near end gain controller 338 and a framing controller 340. As shown, the centralized noise 
estimator 332, the residual estimator 334, the near end signal estimator 336, the near end gain 
controller 338 and the framing controller 340 are associated with the transmit signal 
processing section. However, the control processing section also comprises a far end signal 

10 estimator 342 and a far end gain controller 344 associated with a receive signal processing 
section. 

The receive signal processing section takes received audio signals 104 as input. A 
lost packet handler 360 is provided to mitigate the effects of lost packets on the received 
audio. The speech decoder 362 converts the received audio signal from a parameterized or 
15 other representative form to a continuous speech stream. The received speech is then 
provided to an ALC 364. Note that the redundant blocks illustrated in FIG. 2 have been 
consolidated in the single control block in FIG. 3. Examples of consolidated and enhanced 
functions include the VADs and the background noise estimators. 

Almost all of the blocks in FIG. 2 have some form of Voice Activity Detection 
20 (VAD) circuitry built into them. The NR sub-system needs to know when speech is absent so 
that it can update its estimate of the background noise. NR also needs to know when speech 
is present so that it can adjust gains and calculate signal powers. The ALC block needs to 
know when speech is present so that it can get a good reading of the voice signal levels. The 
echo canceller uses a form of VAD called a double talk detector (DTD) to reduce the 
25 influence of uncorrelated signals and thus improves its estimate of the echo. The speech 
encoder and accompanying silence suppressor uses a VAD to detect silence, which triggers a 
reduction in the rate of transmitted packets (i.e. during silence the codec outputs a description 
of the silence/background-noise periodically). The integrated approach creates a common 
VAD that reduces the complexity of the product and in turn, increases density and reduces 
30 cost. In addition, the consolidated VAD performs more accurately than the individual VADs. 

Higher performance is the result of several factors. First, interaction problems that 
can occur when multiple voice activity detectors (VAD) are used can be avoided. Each block 
increases the likelihood that the subsequent blocks' VADs will misinterpret speech as silence 
or silence as speech. Additionally, the problem of cascading errors is avoided. Certain 
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problem cases can cause a single block to perform incorrectly on a segment of speech or 
silence. In the multiple VAD case, this can have a cascading effect as the subsequent blocks' 

VADs trigger errantly. 

The goal of the VAD is to accurately detect the presence of either NE or FE speech. 
5 If speech is present, then the associated processing of the ALC, NR. or speech coder is 
performed. The echo canceller's double talk detector (DTD) is another form of VAD. It 
must detect both NE and FE speech and control the canceller so that it only adapts when NE 
speech is absent. Interaction between the elements such as the NLP, NR. or changes in the 
ALC can negatively affect the accuracy of the downstream VAD. For example, losses in the 
1 0 NLP, NR, or ALC subsystems may falsely trigger the speech encoder to misinterpret voice as 
silence. This would cause the codec to clip the NE speech, which would degrade voice 
quality. Similarly losses in the NLP or NR subsystems could cause the VAD in the ALC 
block to perform errantly. Of course the loss in the NLP could likewise cause the NR 
subsystem to perform incorrectly, thereby suppressing voice. This problem would then 
15 cascade into all subsequent blocks. These problems are further accentuated by the various 
hold-over or hangover counters and the increased number of possible voice activity states in 
more sophisticated NR systems. A NR system can be established that uses a probability of 
speech presence measure to control the algorithm instead of a simple threshold. 

A second factor in the VAD's performance enhancement is that it uses metrics from 
20 several of the blocks that would otherwise only be visible to a single block. The consolidated 
VAD (CVAD) uses performance measures from the echo canceller block such as Echo 
Return Loss Enhancement (ERLE), along with typical VAD measures (e.g. RMS power and 
zero-crossings) for both transmit and receive voice signals. The CVAD also uses the spectral 
properties and formant information from the noise reduction algorithm and speech encoder. 
25 The other speech encoder parameters are also used to help determine voice activity. The 
encoder's pitch predictor provides a powerful indicator of the presence of voiced speech and 
is used to further improve the CVAD. Those having ordinary skill in the art are familiar with 
these metrics and their use in implementing VADs. 

A third factor in the CVAD performance enhancement is that it controls all of the 
30 hold-over and voice states for each of the subsystems. A hold-over function is commonly 
added to a VAD to improve the system's performance for unvoiced speech by preventing 
state changes until a predetermined period of time has expired. The use of multiple voice 
states is a VAD enhancement that is part of a proprietary adaptive noise cancellation (ANC) 
algorithm of Tellabs, which is used for noise reduction. Centralizing the control of these 
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interacting enhancement functions prevents unstable inter-block interaction. Hence, with the 
CVAD, both of these VAD enhancements can be optimized for each subsystem without 
having a detrimental effect on other sub-systems. 

Similarly, the speech presence sensitivity requirements of each block differ. For 
5 instance, if given a choice between having the speech coder not recognize silence or 
performing silence suppression procedures during low-level speech, the former would be the 
obvious choice. Thus, the speech coder requires high speech sensitivity. Some of the other 
functions such as EC and NR can generally accommodate a less sensitive VAD, and benefit 
from a multi-level speech probability measure. For instance, the EC can slow the adaptation 
10 of its taps as the probability of speech presence measure approaches the DTD threshold. And 
as previously mentioned, a NR system can be established that uses a probability of speech 
presence measure to control the algorithm instead of a simple threshold. 

In order to accommodate the different speech presence sensitivity requirements, the 
CVAD provides appropriate voice activity signals to the different blocks; although, the VAD 
15 processing is integrated. For instance, the CVAD would normally provide just a binary 
speech present or absent signal to the speech coder, while a multi-level or probability of 
speech presence measure is provided to the other blocks. These three CVAD factors combine 
to create high performance VAD, which produces a powerful improvement in overall system 
performance. 

20 

3. Integrating EC and NR Functions 

The interaction between self-optimized processing blocks can result in sub-optimal 
overall performance. This can be particularly pronounced for the EC function's NLP and the 
noise reduction function. This is particularly poor when ERLE is poor, which is the case 
25 when the NLP is used without the EC. The result is an intermittent choppiness in the speech 
and background noise. 

By integrating the EC and NR functions together a significantly improved system is 
created. Integrating these two functions facilitates a reordering of the NLP and the NR 
subsystems. In the NGVPS, the NR subsystem is placed between the EC and the EC's NLP. 
30 This is important to speech quality, as the nonlinear nature of the NLP affects the NR system 
in a dramatic way. When the NLP is placed before the NR function, the NLP can change the 
noise location and affect its level at various frequencies in a time varying fashion that is 
difficult to track in the NR system. This is because most of the NR system's noise estimates 
are performed during silence, but used during speech. This makes NR systems susceptible to 

8 
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time varying noise backgrounds, particularly with regard to spectral content. Additionally, 
the NLP with its associated noise injection process may have different background noise 
levels when speech is present compared to when speech is absent. This is effectively a time 
varying noise source, which would degrade NR performance in a typical voice processing 
5 system (VPS). 

The integrated system places the NR function between the EC and the NLP. It also 
uses a central noise and signal estimate as described in Section 4. The estimates are adjusted 
to compensate for the effect of the NR system in the control of the NLP. The NR system 
reduces noise by a fixed factor during times of voice inactivity. 
10 It has been shown that improved NLP performance is realized when the NLP operates 

in the sub-band domain. However, sub-band NLPs are rarely used due to the cost of creating 
the sub-band signal, both in real dollars as well as processing power and delay. However, the 
NGVPS offers this sub-band option, by further integrating the NLP into each of the NR 
systems sub-bands. These sub-bands are created as part of the noise reduction function. 
15 Hence, by integrating these two functions together, performance can be gained without the 
added cost. The sub-band NLP further improves performance. The integrated EC and NR 
approach out-performs the black-box approach even without this further enhancement. 

In one arrangement, the voice processing blocks include an echo canceller, noise 
reduction block and level adjustment block. Each of those blocks makes a gain adjustment to 
20 the input signal. Normally this is done by each block independently. A preferred 
implementation involves computing the adjustments individually in each block but then 
adjusting the signal once per the combined adjustment calculations in one central adjustment 
block, function or location. 

25 4. Centralized Noise and Signal Estimates 

Contrast once again the block diagram of an integrated voice-over-x DSP system as 
shown in FIG. 3 with the system shown in FIG. 2. The multiple signal estimators of FIG. 2 
have been consolidated into a single signal estimator in the control block. Likewise, the 
multiple noise estimators of FIG. 2 have been consolidated into a single noise estimator in the 
30 control block. 

The signal estimator is very closely related to some parts of the consolidated VAD 
(CVAD) function and should perhaps be shown as part of the VAD. This consolidated signal 
estimator includes both broadband and sub-band signal estimates. The majority of the 
processing power associated with creating the sub-band estimates is actually part of the NR 
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process. Similarly, the majority of the processing power for the broadband estimate can be 
considered to be part of an ordinary VAD. These calculations can now be shared by the new 
high performance CVAD as well as the NR and ALC subsystems. 

The various background noise estimates are consolidated into a single background 
5 noise estimate. This background noise estimate is actually a set of estimates, some 
broadband and some sub-band, but is referred to in singular to avoid confusion with the 
unconsolidated estimates. This estimate is derived from the transmit signal just after the 
near-end echo estimate is subtracted by the canceller. The consolidated noise estimate serves 
as the background noise estimate to the NLP subsystem for background noise transparency 

10 (also known as comfort noise injection), the NR subsystem (for spectral subtraction of 
background noise), and the speech encoder (to send silence descriptors during silence). It is 
also shared by the VAD to help it avoid false triggers resulting from noise and to more 
accurately calculate the probability of speech being present. Using the signal out of the echo 
subtraction block improves the quality of this noise estimate, as the estimate is taken before 

15 performing other processing, which would corrupt the estimate. This improves the quality of 
the entire system. For example, the improved background noise estimate can be used in the 
NR, which, in turn, increases the amount of noise reduction and reduces any artifacts or 
distortion in the speech. Distorted speech is even more difficult to model in the codec, so it, 
in turn, would add more distortion. The silence suppressor uses a version of the noise 

20 estimate, which has been modified to account for the effect of the NR system. This improves 
the accuracy of the silence suppressor and reduces the noise modulation. 

The quality of the noise often distinguishes one VBE system from the next. On 
average, speech is active less than 50% of the time, in a given direction. 

25 5. Consolidated Noise Injection 

In telephony digital PCM systems, the analog signal is sampled 8000 times per second 
and converted to an 8 bit digital a-law or ^-law encoded signal. Voice Processing Systems 
interface with this PCM encoded digital data stream. An echo canceller is one such device 
that adapts to the impulse response of the near-end transmission facility and produces an echo 
30 estimate by multiplying this impulse response by the signal from the far end. This echo 
estimate is subtracted from the near-end signal producing a signal which has the echo 
component removed. This process is not exact because of the quantization distortion of the 
a-law and ^i-law encoding processes. This quantization distortion limits the echo return loss 

10 
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enhancement (ERLE) to approximately 33 dB even if all other processes are perfect. This still 
leaves a residual echo signal that is perceptible to the far-end talker. Historically, this 
problem is solved within the echo canceller design by passing the signal through a non-linear 
processor (NLP). The function of the NLP is to remove or attenuate the residual echo 
5 component of the signal so that it is no longer perceptible to the far-end talker. 

One issue with the use of NLPs is apparent where high non-linearity (from acoustic 
echo) and background noise is present. When the far end user speaks, their voice energy 
drives the NLP to operate thereby removing the residual echo. At the same time however the 
far end user also hears the background noise muting, an effect known as background noise 
10 modulation. This is particularly obnoxious to the far end speaker if there is a perceptible 
delay between the far-end and near-end telephones, since this modulation effect is not 
covered up by the sidetone applied to his own earpiece. 

Historically, one solution to enhancing "background transparency" is to add "comfort 
noise" that matches the level of the idle channel noise when the center clipper is active. One 
15 approach for accomplishing this is described in United States Patent No. 5,157,653 issued in 
the name of Roland Center, the teachings of which are hereby incorporated by this reference. 
This works in most instances, causing this noise modulation effect to be essentially non- 
perceptible to the far-end listener. The key, however, is the close spectral matching of the 
comfort noise to the idle channel noise, which requires additional processing power in any 
20 system. 

The present invention contemplates how another aspect of a voice processing system, 
such as the noise reduction system element as a specific example, can be used during it's 
otherwise "idle" time to provide virtually non-perceptible insertion of a derived noise signal 
into the gaps created during NLP operation. 
25 While it may be possible to design an NLP to remove significant non-linear "echo- 

artefacts (as may be found in the tail circuit of a mobile cellular telephony network, for 
example) without disturbing the background noise, it is considered that the processing power 
required to effectively achieve such puts this solution out of the reach of a practical system. 
The present invention limits or altogether circumvents any such onerous requirement by 
30 keeping the NLP basic and using otherwise spare processing power. 

Referring now to FIG. 4, there is illustrated an exemplary echo canceller (EC) and 
noise reduction (NR) system in accordance with prior art techniques to which the present 
invention may be applied as described below. In general, operation of the echo canceller 
filter, the NLP, and the noise reduction filter are well published and known to those of 
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ordinary skill in the art, and therefore need not be described in substantial detail herein. 
Accordingly, the focus of the following discussion will be on the technique by which system 
elements and/or characteristics and/or resources, such as for example the readily accessible 
noise reduction processing aspect of the system, can be used to provide a dynamic spectrally 

5 and amplitude matched comfort noise injection signal for insertion into the gaps of signal 
created by the NLP in response to far-end speech. 

During a telephone call the NLP will be operating when the far-end talker speaks (to 
prevent residual echo), and releasing when the near-end talker speaks. During double-talk, 
speech is passing in both directions and the NLP is released, but the residual echo remaining 

10 after the echo canceller filter is likely to be below a disturbing level. In consideration of the 
near-end speech scenario, during this time the noise reduction processor will be converging 
on the stationary content of the background noise, this being the part of a noise signal for 
which the amplitude and spectrum remain constant over some seconds. 

In the next instance the far-end talker will respond to the near-end talker and the echo 

15 canceller filter algorithm decides whether the NLP should be operated or not (low to medium 
near-end noise, or high near-end noise condition respectively). If the NLP is operated then 
the residual echo and any near-end noise will be muted, giving rise to a background noise 
modulation effect perceived by the far-end. In an alternate (and for claim construction, an 
equivalent) embodiment, in other NLP operations, residual echo and any near-end noise 

20 might be compressed, scrambled, or compressed and scrambled, or clipped or passed through 
unprocessed. From experience, perception of the modulation effect by the far-end user is 
increased if delay over the telephone circuit is increased (>40mS round trip delay). The 
overall effect is quite disturbing. Background noise modulation can be an issue wherever the 
speech path is interrupted, which is why the techniques described herein are equally useful in 

25 systems employing discontinuous transmission (DTX) methods and voice activity detectors 
(VAD). 

Many voice-processing systems use a fixed spectrum noise injection system, which is 
quite suitable for wireline systems where the requirement is to match to random circuit noise 
("white" noise), which is of equal amplitude per frequency over the channel bandwidth. A 
30 problem occurs however, because in nature the spectrum of acoustically derived background 
noise does not correspond to random noise, but is produced by music, background from 
traffic, car noise, or crowd noise (e.g., noise heard over a pay telephone in a restaurant). In 
many cases, the comfort noise injection is more obnoxious than having no noise injection. 
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The desirable approach is to sample the noise during the speech gaps and derive a 
noise model of the stationary element for both amplitude and spectrum; in other words, a 
model comprising spectra! and gain estimates. As known in me art, the* esdma.es may be 
determined on a broadband or sub-band basis. By deriving the stationary element, a sample 
5 of random, spectrally and amplitude matched noise is available ro use, .ess the non-stimonary 
elements mar could cause a repeatable pattern during playback into the signal path. The 
derived noise model can then be seamlessly (substantially unnoticed in me resultant audro) 
injected into the signal path following the NLP, whi.st tire NLP is operated. The level of tire 
noise injection may be partially based upon NLP parameters to accommodate vanous levels 
,0 of muting or scrambling that might be taking place. Therefore the control for samphng the 
noise and injecting the noise is common ro me NLP control line (no, known in pnor art 
systems) from me echo canceUer filter shown in FIG. 4. For purposes of claim consmrction, 
tire rem, -injecting" refers ro (means) substittrting a noise signal for an NLP output, as wet as 
combining a noise signal with the NLP output. 
, 5 Techniques for deriving me noise spectrum and amplitude generally appear m other 

sysrem designs, however among me differences between such other designs and tire approach 
taken in me conrex, of tire herein-described embodiment of tire present invention is that the 
system described herein' makes alternative use of a, leas, one aspect of a voica processor 
system In particular and in tire context of me above-described and -illustrated EC and NR 
20 system, resources associated with me noise reduction processor and system are used, durtng 
wha. is effectively an idle period for traditional noise reduction processors (e.g. when the 
NLP is operated), in a manner .o improve .he perceived quality of tire commuhicated srgnal. 

Referring back .o FIG. 4, ordinarily when tirere is a signal from tire near-end, the 
noise .eduction processor will be converging on the stationary element of tire noise signal and 
25 then applying a filter taction to remove a defined amount of tire stationary noise from tire 
signal When the NLP is operated (to remove residual echo and background noise) the notse 
reduction filter is "frozen," or in other words no. updated or ottrerwise changed, so that tire 
tnodel is no, los, while me NLP is in operation. The noise reduction fiUer does no, ordmarrly 
taction ,o provide noise reduction during mis period of NLP operation, bu, men resumes 
30 operation onco tire NLP is no .onger operated. In tins way as me noise spe«rum and 
amplinrde change mroughou, tire filter processor can back me changes and efficiently reduce 

the noise. . A . 

In the context of the present invention, the spectral and gain estimates mamtamed by 
the noise reduction filter, which are typically frozen as described above, are referenced and 
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used in a new manner for the generation of a noise signal for injection into the 
communication signal at the appropriate intervals (e.g., during operation of the NLP). One 
example approach for using such filter coefficients in this manner to generate a noise signal 
for injection is to use them to filter white noise that is internally generated. This noise could 
5 be broadband noise that is then filtered by each sub-band weighting coefficient or 
independent per each sub-band also weighted by each sub-band coefficient. In either case, 
the generated noise then has the same spectral characteristics as the true or actual background 
noise since the adaptive sub-band weighting coefficients converge to the spectral coefficients 
of that noise. By using the gain estimate(s) to scale the spectrally matched noise, the model 
1 0 is able to more accurately match the background noise. 

In this way, at appropriate points during the conversation the noise reduction system 
effectively contributes to noise generation, but not at the same time that the noise reduction 
filter is operating to provide typical noise reduction. An example embodiment of this aspect 
of the present invention is illustrated in FIG. 5. In particular, a transmitted voice signal is 
15 provided to an echo canceller 502 and non-linear processor 504. The resulting signal is then 
sent to an adaptive noise estimator/reducer 506. Additionally, a control signal 510 indicative 
of the active/inactive state of the NLP 504 is sent to a noise reduction controller 508. In turn, 
the noise reduction controller 508 provides a noise reduction control signal 512 to the 
adaptive noise estimator/reducer 506. Thus, if the NLP 502 is inactive, the controller 508 
20 configures the noise reduction control signal 512 to instruct the adaptive noise 
estimator/reducer 506 to allow the noise estimator to adapt and subtract a portion of the noise 
from the input voice signal. Conversely, if the NLP 502 is active, the controller 508 
configures the noise reduction control signal 512 to instruct the adaptive noise 
estimator/reducer 506 to freeze the noise estimation process and generate synthesized 
25 background noise based on the current frozen background noise model. The synthesized 
noise is thereafter added to the input signal. 

Tests have shown the resulting noise insertion system to have a good match in 
subjective listening tests and imperceptible operation in conversational tests for a wide range 
of program material. Even when there is a high content of non-stationary noise in the 
background noise, the loss of this detail in the returned signal to the far-end user is not 
considered disturbing since they are talking at this time and sensitivity to non-stationary noise 
is reduced. It is certainly the case that the far-end talker perceives disturbance in the 
stationary content greatest and the present invention can be used to resolve this issue. 



30 
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This same centralized system is used by the codec for its background noise estimate, 
which is used to generate its SID (silence description) packets when DTX (discontinuous 
transmission) or multi-rate transmission is active. The noise estimate used by the codec ,s 
able to take into account NR. NLP, and noise injection levels and the noise spectrums. These 
5 make DTX as unobtrusive as possible. 

6 System Awareness and Optimization for Codec Frames and Packetization 

Current voice processing systems (VPSs) synchronize the packetization engme to the 
speech frames generated by codecs. This provides a natural packetization while reusing the 
10 same buffering and signal delay for both purposes. This has been accomplished without 
breaking the black-box approach to building a system, because the frame output of the codec 

is simply incorporated into the packets. 

Another way in which the integration of the NGVPS outperforms current generatron 
of VPSs is by synchronizing the entire system to fixed boundaries, preferably, the codec 
,5 frames, sub-frames or both. Referring again to FIG. 3, mis is accomplished by the fiarmng 
control block 340 issuing at least one boundary control signal to the respective votce 
processing blocks, which control signal informs fine blocks of the boundaries. TTrts provtdes 
enhanced perfomrtmce for a number of blocks. The ALC, NR, and EC functions of the 

NGVPS are all enhanced. 
20 ALC is used to add gain to low-level voice signals when too much transmiss.on loss 

is encountered or to reduce high-level speech signals, which may overdrive analog circuits at 
the other end of the network. The intelligent block-to-block control coordinates the 
interaction of the automatic gain control and the speech coder. Gain control changes are 
synchronized with the frame boundaries of the speech coder. This allows the NGVPS to hold 
25 the gain constant during the speech coder sub-frames and/or frames. By not changmg the 
gain during sub-frames and/or frames coder performance is enhanced. Reducing the 
variation of the signal level mid-frame improves the modeling of the speech by the encoder. 
Mid-frame level changes require a trade-off in the coder's non-gain speech parameters. The 
codebook search, for example, needs to select an excitation vector, which when played out 
through the filter based on the LPC coefficients would have a sudden increase in volume. 
This does not fit the normal speech model very well and can dominate the selection of a 
codebook vector causing the more subtle characteristics to be overlooked. Depending on the 
particular coder, each frame and/or sub-frame of the coded speech contains a gain parameter. 
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By synchronizing the ALC gain changes to these boundaries, the changes can be modeled in 
the gain parameter without the degenerating effect on the selection of the other parameters. 

The ALC algorithm is not only synchronized to the frames in order to coordinate its 
gain adjustment times, but also to take advantage of the data-blocking required for the 
5 codecs. An important part of an ALC system is the ability to minimize clipping due to over- 
amplification. By synchronizing the ALC system to the data-blocks, the ALC system can 
look at the entire block for clipping, and incorporate that into its gain selection. 

This same type of look-ahead is used to improve the VAD's performance. It is often 
difficult to recognize changes in voice activity until some time after they happen. By adding 
10 look-ahead to a VAD its performance can be improved. Some codecs such as G.729 and 
G.723.1 require look-ahead data to perform their functions. Again by coordinating the data- 
blocks with the VAD function, the system VAD can use look-ahead without adding delay to 
the system. 

Many families of noise reduction algorithms, such as the NR algorithm currently 
15 being sold by Tellabs, operate on blocks of data at a time. The blocking up of data adds 
delay to these systems. Unfortunately, these systems are typically used in highly delay- 
sensitive applications. The NR algorithms are typically fast Fourier transform (FFT) based 
and require significant buffering. Wavelet-based algorithms and those requiring look-ahead 
would also require buffers of data and have similar delay implications. The NGVPS 
20 eliminates the additional buffering delay required in other systems by using the same data- 
blocking delays associated with the codecs to perform noise reduction. The current black- 
box systems do not have this level of synchronization between elements. 

The system-wide awareness of the codec frame is also used to improve the operation 
of the EC. This will be explained in the next section along with the other features of the 
25 NGVPS EC. This along with various other EC improvements are included as part of Section 
7. 

7. Network Adaptive Advanced Echo Canceller with Codec Integration 

Another feature of the present invention that can significantly enhance voice quality is 
30 the inclusion of a far end echo canceller. Some of today's TDM carriers choose to cancel 
echo in both directions using a single network element. These "duo" echo cancellers are 
most popular in wireless environments, where delay introduced in the wireless air interface 
creates the need to cancel echo in both directions; i.e. echo from the PSTN and wireless 
terminal. In a packet voice network, an operator may similarly choose to deploy a duo 
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canceller configuration, as the same condition exists. (Note that the term "packet network", 
as used throughout this disclosure, is a specific example of a wider class of variable delay 
networks to which the present invention is applicable.) The packet network with speech 
compression adds delay to connections that might otherwise not need a canceller, as in 
wireless applications. FIG. 6 shows the duo layout comprising a near end echo canceller 602 
and a far end echo canceller 604. Notice that the far end or packet switch echo canceller has 
the packet network in its endpath. Packet networks are notorious for dropped packets and 
significant delay variation. Both of these impairments can severely affect the performance of 
a canceller. In a standard voice-over-x implementation, the packet processor has some 
) knowledge of the lost packets and changes in endpath delay. By sharing this information 
with the far end echo canceller and by subsequently using this information to intelligently 
control the canceller's behavior, the detrimental effects created by the packet network are 
minimized. In other words, voice quality is optimized. Some advanced TDM networks 
being created for the wireless world may also have changing endpath delay. 
5 This advanced echo canceller (AEC) has a couple of new features. First, it is 

synchronized to packet boundaries and can disable both coefficient update and echo 
cancellation on a packet by packet basis. When a packet is lost and has to be replaced using 
lost or errored packet substitution, the coefficients are frozen and echo cancellation is 
disabled. If echo cancellation were not disabled, subtracting out the estimated echo response 
20 would actually add echo. This would result because the substituted packet would be so 
different from the lost packet that subtracting the actual echo would effectively be adding the 

negative of the echo to this signal. 

In a more advanced version, the packet substitution algorithm does not base the 
replacement packet on the previously received packets, but on the echo cancelled versions of 
25 these packets. 

Another feature of this AEC is that it is integrated with a decoder that receives the 
same silence description (SID) information sent to the far-end. This enables the near end EC 
to construct the signal being generated at the far-end. Normally, the SID packets only 
contain spectral information, which the far end uses to filter randomly chosen excitation 
30 vectors. As a result, the accuracy of the reconstructed signal at the far end is limited to the 
spectral characteristics conveyed by the SID information. However, when the far-end codec 
is part of the end-to-end system, as with the present invention, it is possible to synchronize 
the local random codebook excitation selection with that being used at the far-end. Such 
synchronization may take advantage of any unused bits in the SID packets, which are usually 
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the same size as the regular speech packets but only contain spectral information. The unused 
bits corresponding to the codebook excitation are available for random number generator 
synchronization between the two ends. This allows the AEC to have access to the signal that 
is echoing back, even when DTX is active and comfort noise generation is taking place at the 

5 far-end in response to SID packets. Without this feature, the EC would not know what signal 
was being echoed back and would have to disable coefficient updates. A secondary issue 
with not having this feature is that any echoed noise would have to be left in the received 
signal. Preferably, this decoder is active even for non-SID packets. This helps to reduce the 
nonlinearity of the endpath by modeling the effect of the coder-decoder combination in one 

10 direction. 

A last feature of the AEC is the ability for the echo cancellers at either end to move 
their respective h vectors (i.e., time domain transfer function) in response to changes in delay 
in their respective endpaths. As known in the art, such h vectors model the delay 
characteristics giving rise to echo conditions. In this regard, each end of the AEC maintains 
1 5 jitter buffers, which adjust in response to network conditions. At the end local to a given EC, 
the EC receives information from its local jitter buffer and moves the effective locations of 
the h vector's coefficients in response to the buffer adjustments. Additionally, or 
alternatively, the EC also monitors its ERLE metric. If the ERLE degrades past one or more 
thresholds, the EC knows to adjust its h vector's coefficient locations; if the delay has 
20 changed the AEC adjusts the h vector's coefficient locations accordingly. In this way the 
AEC can accommodate delay changes that occur and are not under the NGVPS's control. 
These types of delay changes can occur due to adjustments in other network buffers. 
Furthermore, information regarding changes to delay characteristics determined at one end 
can be forwarded to the other end so that the effects of the changed delay can be accounted 
25 for at both ends. For example, if the far end detects a change in delay characteristics having 
an effect on an echo path manifested at the near end, the far end can send information 
regarding the change in delay to the near end so that it can begin to adjust its coefficients in 
anticipation of receiving the audio impacted by the change delay. 

These features are also applicable to certain TDM networks, particularly those in the 
30 wireless world where speech compression and DTX can create many of the same problems, 
which the AEC addresses for packet network applications. 

While the foregoing detailed description sets forth presently preferred embodiments 
of the invention, it will be understood that many variations may be made to the embodiments 
disclosed herein without departing from the true spirit and scope of the invention. This true 
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spin, and scope of fte present invention is defined by the appended daims, to be interpreted 
in light of the foregoing specifications. 
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Claims 

What is claimed is: 

1 . In a communication system comprising a plurality of voice processing blocks used to 
process a transmitted voice signal, a method for controlling operation of the plurality of voice 
processing blocks, the method comprising steps of: 

providing a centralized frame controller coupled to the plurality of voice processing 

blocks; 

providing, by the centralized frame controller, at least one boundary control signal to 
the plurality of voice processing blocks, 

wherein operation of each of the plurality of voice processing blocks on the 
transmitted voice signal is dependent in part upon the at least one boundary control signal. 



2. The method of claim 1 , wherein the at least one boundary control signal is determined 
based on at least one of a frame boundary and a plurality of sub-frame boundaries 

1 5 corresponding to operation of a speech codec. 

3. The method of claim 2, wherein the frame boundary and sub-frame boundaries 
correspond to the codec frame and sub-frame boundaries. 

20 4. The method of claim 2, wherein the plurality of voice processing blocks comprises at 
least one automatic level control circuit. 

5. The method of claim 3, wherein the at least one boundary control signal delineates 
periods of time, and wherein the at least one automatic level control circuit, in response to the 

25 at least one boundary control signal, maintains a gain factor at constant levels during each 
period of time. 

6. The method of claim 4, wherein the at least one automatic level control circuit, for 
each period of time, determines the gain factor by analyzing a portion of the transmitted voice 

30 signal delimited by the period of time. 

7. The method of claim 1 , wherein the plurality of voice processing blocks comprises at 
least one noise reduction circuit. 
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8 The method of claim 6, wherein me a, .eas, one boundary eon*,, signal delinea.es 
periods of time, and wherein me a, leas, one noise redoetion circui, in response « , 
one boundary control signal and for eaeh penod of lime, performs no.se reduction proeessmg 
on a portion of the transmitted voioe signal delineated by the penod of bme. 
5 9 . ■nrememodofclaiml.whereinmeploralityofvoioeproeessingbloekseompnsesat 
least one echo canceller. 

,0 The method of e.aim 8, wherein the a. leas, one boundary control signal de.inea.es 
,0 periods of rime, and wherein me a. leas, one echo canceller, in response .0 the a. leas, one 
ZZ conn;, stgnal and for eaeh period of time, performs echo cancelation processing 
on a portion of Ae ttansmitted voice signal delineated by .he period of bme. 

.1 An apparatus for processing a transmitted voice signal, comprising: 
' , pmraltty of voice processing blocks .ha. each opera* upon tire ttansntttted votce 

^Tcentrahzed frame controller, coupled ,o each of dte p.urality of voice processing 
blocks, tha, provides a. leas, one boundary control signal to *e plurality of voice processus 

^ wherein operalion of each of dte pluraltty of voice processing blocks on dte 
oaIK mn,e4 voice signa, is dependen. in part upon me at leas, one boundary contto. stgnal. 

,2 The apparatus of claim 11 , further comprising a programmable processor coupled to a 
aorage device, wherein me centtahzed frame conttoUer is implement* via insttnettons 
25 executed by fte programmable processor and stored in d.e slorage devtce. 

13 The apparattts of Cairn 1 1 , wherein ft. centtalized frame contto.ler defines tire a. 
leas, one boundary conttol stgnal based on a. least one of a frame boundary and a p.urahty of 
sub-frame boundaries corresponding to operation of a speech codec. 



15 



20 



30 



, 4 . The apparattts of claim „, wherein dte plurality of voice processing blocks comprises 
at least one automatic level control circuit. 
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15. The apparatus of claim 1 4, wherein the at least one boundary control signal delineates 
periods of time, and wherein the at least one automatic level control circuit, in response to the 
at least one boundary control signal, maintains a gain factor at constant levels during each 
period of time. 

5 

16. The apparatus of claim 15, wherein the at least one automatic level control circuit, for 
each period of time, determines the gain factor by analyzing a portion of the transmitted voice 
signal delimited by the period of time. 

10 17. The apparatus of claim 1 1 , wherein the plurality of voice processing blocks comprises 
at least one noise reduction circuit. 

1 8. The apparatus of claim 1 7, wherein the at least one boundary control signal delineates 
periods of time, and wherein the at least one noise reduction circuit, in response to the at least 

1 5 one boundary control signal and for each period of time, performs noise reduction processing 
on a portion of the transmitted voice signal delineated by the period of time. 

19. The apparatus of claim 1 1 , wherein the plurality of voice processing blocks comprises 
at least one echo canceller. 
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20. The apparatus of claim 1 9, wherein the at least one boundary control signal delineates 
periods of time, and wherein the at least one echo canceller, in response to the at least one 
boundary control signal and for each period of time, performs echo cancellation processing 
on a portion of the transmitted voice signal delineated by the period of time. 



21. In a communication system comprising at least one echo canceller coupled to a 
variable-delay network, a method comprising steps of: 

determining, by a first echo canceller of the at least one echo canceller, delay 
characteristics related to at least one voice signal received via the variable delay network; 
30 determining, by the first echo canceller, that delay characteristics corresponding to the 

at least one voice signal have changed; and 

modifying, by the first echo canceller, echo cancellation processing on the at least one 
voice signal in response to the changed delay characteristics. 
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22 The method of claim 21 , further comprising a step of: 

prior to the step of determining that the delay characteristics have changed, 
determining that echo cancellation performance has degraded. 

5 23 The method of claim 21, wherein the step of determining that the delay characteristics 
have changed further comprises inspecting a jitter buffer used to store the at least one vo.ce 
signal. 

24. The method of claim 21, wherein the at least one voice signal is comprised of a 
1 0 plurality of packets, the method further comprising steps of: 

correlating the delay characteristics with a portion of the plurality of packets, 
wherein the step of modifying further comprises discontinuing echo cancellation 
processing for the portion of the plurality of packets. 

« The method of claim 24, wherein the step of modifying further comprises substituting 
previously echo cancelled packets for missing packets of the plurality of packets. 

26 The method of claim 21, wherein the step of modifying further comprises adjusting a 
time domain transfer function used to perform the echo cancellation processing. 
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27 The method of claim 21, wherein the step of determining that the delay characters 
have changed further comprises receiving information regarding changes to delay 
characteristics corresponding to a second echo canceller of the at least one echo canceller. 

25 28. In a communication system comprising at least two echo cancellers, a method 

comprising steps of: ^ o 
detennining, by a first echo canceller of ore a. least rwo echo cancellers, stlence 
descriptor information related to a portion of a ttansmitted voice signal sen, front the first 
echo canceller to a second echo canceller of the a, leas, two echo cancellers; 

transmitting, by me firs, echo cnceUer ,o me second echo canceller, me stlence 
descriptor information and excitation vector information; and 

reconsttucung, by the second echo canceller, the portion of me ttansmttted votce 
signal based in par, upon the sHence descriptor information and me excttation vecor 
information. 



30 



23 



BNSDOCIO <WO 013381 4A1_L> 



WO 01/33814 



PCT/US00/30298 



29. The method of claim 28, wherein the silence descriptor information comprises 
spectral information regarding the portion of the transmitted signal. 

5 30. The method of claim 29, wherein the excitation vector information identifies a 
particular excitation vector that, when filtered according to the spectral information, provides 
an estimate of the portion of the transmitted signal. 
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3 1 . The method of claim 28, further comprising steps of: 

receiving, by the first echo canceller from the second echo canceller, a received voice 
signal based in part upon the silence descriptor information and the excitation vector 
information: and 

modifying, by the first echo canceller, echo cancellation processing on the transmitted 
voice signal based on the received voice signal. 



32. In a communication system comprising a plurality of voice processing blocks used to 
process at least one voice signal, a method for controlling operation of the plurality of voice 
processing blocks, the method comprising steps of: 

providing a centralized voice activity detector coupled to the plurality of voice 
20 processing blocks; 

performing, by the centralized voice activity detector, a first type of voice activity 
analysis on a transmitted voice signal of the at least one voice signal; and 

providing, by the centralized voice activity detector, at least one voice activity 
indication to the plurality of voice processing blocks in response to the first type of voice 

25 activity analysis, 

wherein operation of each of the plurality of voice processing blocks on the 
transmitted voice signal is dependent in part upon the at least one voice activity indication. 

33. The method of claim 32, wherein the plurality of voice processing blocks comprises 
30 any combination of: a noise reduction circuit, an automatic level control circuit, an echo 
canceller, and a speech encoder. 
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34 The method of claim 32, wherein the at least one voice activity indication comprises 
at least two voice activity indications, and wherein the at least two voice activity ind.cat.ons 
are based on uniquely corresponding, non-identical voice activity thresholds. 

35. The method of claim 34, wherein one of the at least two voice activity indications 
comprises a binary indication. 

36. The method of claim 34, wherein one of the at least two voice activity indications 
comprises a probabilistic indication. 

37 The method of data 32, wherein the step of providing *e at leas, one voiee activity 
indication further comprises providing each of the at leas, one voice activity indication to a 
subset of the plurality of voice processing blocks. 

,5 38 The method of claim 32, wherein the step ofperfotming further comprises performing 
a second type of voice processing analysis on a received voice signal of the a, leas, one votce 
signa. and wherein the step of providing the at leas, one voice activity indication further 
comprises providing the at leas, one voice activity indication only in response ,0 me second 
type of voice activity analysis. 

39 The method of claim 38, wherein the step of providing the at least one vo.ee activity 
indication further comprises providing the at least one voice activity indication in response to 
the first and the second type of voice activity analysis. 

25 40 The method of claim 32, further comprising steps of: 

providing a centralized noise estimator coupled to the centralized voice activity 
detector and at least a portion of the plurality of voice processing blocks; 

performing, by the centralized noise estimator, at least one type of noise estimation 
analysis on the transmitted voice signal; and 
30 providing, by the centralized noise estimator, at least one noise estimate to the portton 

of the plurality of voice processing blocks in response to the at least one type of no.se 
estimation analysis, 

wherein operation of each of the portion of the plurality of voice processing blocks on 
the transmitted voice signal is dependent in part upon the at least one noise estimate. 
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41. A computer-readable medium having stored thereon computer-executable instructions 
for performing the method of claim 32. 

5 42. In a communication system comprising a plurality of voice processing blocks used to 
process a transmitted voice signal, a method for controlling operation of the plurality of voice 
processing blocks, the method comprising steps of: 

providing a centralized noise estimator coupled to the plurality of voice processing 

blocks; 

10 performing, by the centralized noise estimator, at least one type of noise estimation 

analysis on the transmitted voice signal; and 

providing, by the centralized noise estimator, at least one noise estimate to the 
plurality of voice processing blocks in response to the at least one type of noise estimation 
analysis, 

15 wherein operation of each of the plurality of voice processing blocks on the 

transmitted voice signal is dependent in part upon the at least one noise estimate. 



30 



43. The method of claim 42, wherein the plurality of voice processing blocks comprises 
any combination of: a noise reduction circuit, a non-linear processor, a voice activity 

20 detector, and a speech encoder. 

44. The method of claim 42, wherein the at least one type of noise estimation analysis 
comprises broadband analysis and sub-band analysis, and wherein the at least one noise 
estimate comprises a broadband noise estimate based on the broadband analysis and a sub- 

25 band noise estimate based on the sub-band analysis. 

45. The method of claim 42, wherein the transmitted voice signal is subjected to echo 
cancellation processing prior to the step of performing the at least one type of noise 
estimation analysis on the transmitted voice signal. 



46. The method of claim 45, wherein the step of performing the at least one type of noise 
estimation analysis is performed before the transmitted voice signal is subject to non-linear 
processing. 
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47. The method of claim 42, further comprising steps of: 

providing a centralized voice activity detector coupled to the centralized noise 
estimator and at least a portion of the plurality of voice processing blocks; 

performing, by the centralized voice activity detector, voice activity analysis on the 

5 transmitted voice signal; and 

providing, by the centralized voice activity detector, at least one voice activity 
indication to the portion of the plurality of voice processing blocks in response to the voice 
activity analysis, 

wherein operation of each of the portion of the plurality of voice processing blocks on 
10 the transmitted voice signal is dependent in part upon the at least one voice activity 
indication. 

48. A computer-readable medium having stored thereon computer-executable instructions 
for performing the method of claim 42. 



15 



49. An apparatus for processing at least one voice signal, comprising: 

a plurality of voice processing blocks that each operate upon the at least one voice 
signal; and 

a centralized voice activity detector, coupled to each of the plurality of voice 
20 processing blocks, that performs at least one type of voice activity analysis on the at least one 
voice signal and provides at least one voice activity indication to the plurality of voice 
processing blocks in response to the at least one type of voice activity analysis, 

wherein operation of each of the plurality of voice processing blocks on the at least 
one voice signal is dependent in part upon the at least one voice activity indication. 
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50. The apparatus of claim 49, wherein the plurality of voice processing blocks comprises 
any combination of: a noise reduction circuit, an automatic level control circuit, an echo 
canceller, and a speech encoder. 

30 51. The apparatus of claim 49, wherein the at least one voice signal comprises a 
transmitted voice signal and the centralized voice activity detector performs a first type of 
voice activity analysis on the transmitted voice signal. 
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52. The apparatus of claim 51, wherein the at least one voice signal comprises a received 
voice signal and the centralized voice activity detector performs a second type of voice 
activity analysis on the received voice signal. 

5 53. The apparatus of claim 49, wherein the at least one voice signal comprises a received 
voice signal and the centralized voice activity detector performs a second type of voice 
activity analysis on the received voice signal. 

54. The apparatus of claim 49, further comprising a programmable processor coupled to a 
10 storage device, wherein the centralized voice activity detector is implemented via instructions 

executed by the programmable processor and stored in the storage device. 

55. The apparatus of claim 49, further comprising: 

a centralized noise estimator, coupled to at least a portion of the plurality of voice 
15 processing blocks and the centralized voice activity detector, that performs at least one type 
of noise estimation analysis on the at least one voice signal and provides at least one noise 
estimate to the portion of the plurality of voice processing blocks in response to the at least 
one type of noise estimation analysis, 

wherein operation of each of the portion of the plurality of voice processing blocks on 
20 the at least one voice signal is dependent in part upon the at least one noise estimate. 

56. An apparatus for processing at least one voice signal, comprising: 

a plurality of voice processing blocks that each operate upon the at least one voice 
signal; and 

25 a centralized noise estimator, coupled to each of the plurality of voice processing 

blocks, that performs at least one type of noise estimation analysis on the at least one voice 
signal and provides at least one noise estimate to the plurality of voice processing blocks in 
response to the at least one type of noise estimation analysis, 

wherein operation of each of the plurality of voice processing blocks on the at least 

30 one voice signal is dependent in part upon the at least one noise estimate. 

57. The apparatus of claim 56, wherein the plurality of voice processing blocks comprises 
any combination of: a noise reduction circuit, a non-linear processor, a voice activity 
detector, and a speech encoder. 
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58 The apparatus of claim 56, further comprising a programmable processor coupled to a 
storage device, wherein the centralized noise estimator is implemented via instructs 
executed by the programmable processor and stored in the storage device. 

59 The apparatus of claim 56, further comprising: 

a centralized voice activity detector, coupled to at least a portion of the plurality of 
voice processing blocks and the centralized noise estimator, that performs at least one type of 
voice activity analysis on the at least one voice signal and provides at least one voice activity 
indication to the portion of the plurality of voice processing blocks in response to the at least 

one type of voice activity analysis, 

wherein operation of each of the portion of the plurality of voice posing. blocks on 
,he a. leas, one voice signal is dependent in par. upon the a, leas, one voice acuvuy 
indication. 
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60 In a communication system comprising a plurality of voice processing blocks used to 
process a transmitted voice signal, a method for controlling operation of the plurality of vo.ce 
processing blocks, the method comprising steps of: 

providing a centralized signal characteristic estimator coupled to the plurality of vo.ce 

20 processing blocks; 

performmg, by fte centralized signal charac.eris.ic estimator, a. leas, one type of 

signal charac.eris.ic estimation analysis on the mutsmined voice signal; and 

= providing, by the centralized signal characteristic estimator, a. leas, one s,gna. 
charac.eris.ic estimate .0 me pluraMty of voice processing b.ocks in response «o ute a. leas. 

25 one type of noise estimation analysis, 

wherein operation of each of me plurality of voice processing blocks on me 
fitted voice signal is dependent in part upon me a. leas, one signal characKnsttc 
estimate. 

30 61 Thememod of clam. 60, wherein me plurality of voice processing blocks comprises 
any combination of: a noise reduction circui., a non-liner processor, a voice acuvrty 
detector, and a speech encoder. 
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62. The method of claim 60, wherein the at least one type of signal characteristic 
estimation analysis comprises broadband analysis and sub-band analysis, and wherein the at 
least one signal characteristic estimate comprises a broadband signal characteristic estimate 
based on the broadband an signal characteristic analysis and a sub-band signal characteristic 

5 estimate based on the sub-band analysis. 

63. The method of claim 60, wherein the transmitted voice signal is subjected to echo 
cancellation processing prior to the step of performing the at least one type of signal 
characteristic estimation analysis on the transmitted voice signal. 

10 

64. The method of claim 63, wherein the step of performing the at least one type of signal 
characteristic estimation analysis is performed before the transmitted voice signal is subject 
to non-linear processing. 

15 65. The method of claim 60, fiirther comprising steps of: 

providing a centralized voice activity detector coupled to the centralized signal 
characteristic estimator and at least a portion of the plurality of voice processing blocks; 

performing, by the centralized voice activity detector, voice activity analysis on the 

transmitted voice signal; and 
20 providing, by the centralized voice activity detector, at least one voice activity 

indication to the portion of the plurality of voice processing blocks in response to the voice 
activity analysis, 

wherein operation of each of the portion of the plurality of voice processing blocks on 
the transmitted voice signal is dependent in part upon the at least one voice activity 
25 indication. 

66. A computer-readable medium having stored thereon computer-executable instructions 
for performing the method of claim 60. 

30 67. In a communication system comprising a plurality of voice processing blocks used to 
process at least one voice signal, a method for combining the signal operations of the 
plurality of voice processing blocks, the method comprising steps of: 

computing the combined signal adjustment from at least substantially all of the voice 
processing blocks; 

30 



BNSDOCID: <WO 013391 4A1J_> 



PCTAJSOO/30298 

WO 01/33814 

adjusting the input signal in response to said step of computing the combined signal 
adjustment. 

68. A method of compensating for background noise modulation caused by operation of a 
5 non-linear processor on an audio signal, the method comprising steps of: 
determining a background noise model for the audio signal; 

reducing background noise in the audio signal based on the background noise model 
when the non-linear processor is not operating on the audio signal; and 

injecting synthesized background noise based on the background noise model mto the 
10 audio signal when the non-linear processor is operating on the audio signal. 

69 The method of claim 68 wherein the step of determining the background noise model 
further comprises adaptively determining filter coefficients representative of the background 



15 



noise. 



70. 



20 



25 



30 



The method of claim 69, wherein the step of reducing the background noise further 
comprises steps of: 

generating the synthesized background noise based on the filter coeffiaents; and 
subtracting the synthesized background noise from the audio signal. 

71. The method of claim 69, wherein the step of injecting the synthesized background 

noise further comprises steps of: 

discontinuing adaptive determination of the filter coefficients when the non-hnear 
processor is operating on the audio signal to provide fixed filter coefficients; 

generating the synthesized background no.se based on the fixed filter coefficients; and 

adding the synthesized background noise to the audio signal. 

72. A computer-readable medium having stored thereon computer-executable instructions 
for performing the method of claim 68. 

73 In a communication system comprising a non-linear processor coupled to a noise 
reduction circuit, a method for the noise reduction circuit to compensate for background notse 
modulation caused by operation of the non-linear processor on an audio signal, the method 
comprising steps of: 
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receiving a first indication that the non-linear processor is not operating on the audio 

signal; 

reducing background noise in the audio signal in response to the first indication; 
receiving a second indication that the non-linear processor is operating on the audio 
5 signal; and 

injecting synthesized background noise into the audio signal in response to the second 
indication. 

74. The method of claim 73, wherein the step of injecting the synthesized background 
10 noise into the audio signal further comprises steps of: 

discontinuing adaptive determination of a background noise model to provide a fixed 

background noise model; 

generating the synthesized background noise based on the fixed background noise 

model; and 

15 adding the synthesized background noise to the audio signal. 

75. A computer-readable medium having stored thereon computer-executable instructions 
for performing the steps of claim 73. 

20 76. An apparatus that compensates for background noise modulation caused by operation 
of a non-linear processor on an audio signal, the apparatus comprising: 

means for determining a background noise model for the audio signal when the non- 
linear processor is not operating on the audio signal; 

means for reducing background noise in the audio signal based on the background 
25 noise model when the non-linear processor is not operating on the audio signal; and 

means for injecting synthesized background noise based on the background noise 
model into the audio signal when the non-linear processor is operating on the audio signal. 

77. The apparatus of claim 76, wherein the means for determining the background noise 
30 model further operate to adaptively determine filter coefficients representative of the 

background noise. 

78. The apparatus of claim 77, wherein the means for reducing the background noise 
further comprise: 
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m eans for genera** - — ^ » " °" 

taft* operate «o discontinue adaptive detenntna. on of the « ^ 
means for generating the synthesized backgronnd notse based 

80 ^ apparatus that contpensares for background no.se mogadon oaused b y operation 
of a non-linear proeessor on an audio signal, Ore apparatus ^rnpnsn ^ ^ 
. controller, eoupled to the non-Hnea * - • ' ^ ^ 
infomration from the non-linear proeessor and drat provtdes 

as output; and receiving the noise reduction 

a notse reduction eireutt, coup.ed to the — ^ ^ aud , 0 ^ ^ 

. control stgna, and coupler! to the non-Hnear ^^ Jcuon eon.ro, stgnal is 
20 reduces background noise in dte audio stgna. when the no,* ^ 

as.er.eo and .ha, injects synthestzed background no.se tnto the aud.o s,gn 

reduction control signal is no. asserted. 

fitter coefficientsrepresen.ativeof.he background no.se; and . y teized 

a generation circuit that takes as input the filter coefficients axtd pr 
background noise as output. 

f rl«m 81 wherein the noise reduction circuit further comprises: 

when the noise reduction control signal is asserted. 
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83. The apparatus of claim 82, wherein the adaptive filter discontinues adaptive 
determination of the filter coefficients to provide fixed filter coefficients, the generation 
circuit provided the synthesized background noise based on the fixed filter coefficients, and 
the combiner adds the synthesized background noise to the audio signal when the noise 
reduction control signal is not asserted. 

84. A method of compensating for background noise modulation caused by operation of a 
non-linear processor on an audio signal, the method comprising steps of: 

determining a background noise model for the audio signal; 

reducing background noise in the audio signal based on the background noise model; 

and 

injecting synthesized background noise based on at least one of: the background noise 
model and the state of the non-linear processor. 
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